-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: ExtensionArray support for objects with _can_hold_na=False and relational operators #20801
Conversation
One issue here is for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NA stuff looks good. Small question about the test changes.
Taking a look at the ops stuff in a bit.
* _formatting_values | ||
* _can_hold_na | ||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge snafu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I reordered _formatting_values
and _can_hold_na
, since one is a method and the other is an attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant this section specifically. This block is repeated starting on line 57. Or perhaps I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really weird. The copy on my machine is clean and fine, but the copy on GitHub has the repeat. Maybe because I resolved conflicts with master using GitHub interface. UGH.
# type: () -> bool | ||
"""Whether your array can hold missing values. True by default. | ||
_can_hold_na = True | ||
"""Whether your array can hold missing values. True by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what our recommended style is here. You can probably just change it to a comment.
@@ -82,8 +82,9 @@ def test_getitem_scalar(self, data): | |||
assert isinstance(result, data.dtype.type) | |||
|
|||
def test_getitem_scalar_na(self, data_missing, na_cmp, na_value): | |||
result = data_missing[0] | |||
assert na_cmp(result, na_value) | |||
if data_missing._can_hold_na: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes are a bit unfortunate...
Could we instead have the data_missing
fixture raise pytest.skip
when necessary? I suspect this would have to be done by the EA author in their code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make that change to avoid any tests where data_missing
is used, and the EA author would have to make that change if they wanted _can_hold_na=False
.
|
||
|
||
@pytest.fixture | ||
def data_missing_for_sorting(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I clearly didn't have arrays that don't support NA in mind when I wrote this :)
Could you add this note to the data_missing_for_sorting
docstring (in tests/extensions/conftest.py
).?
this won't be for 0.23, you are basically defining the ops protocol for abstraction, which is needed, but also needs substantial testing in the current EA classes. |
Yeah, I think I agree given the deadline we set (RC was supposed to be yesterday). I could see this taking a bit of time to get right. @Dr-Irv could you split the changes for |
@TomAugspurger Yes, I can split the changes out for the |
Yeah, that'd be nice.
It can be a very simple example I think... Maybe make a subclass of
`DecimalArray` with `_can_hold_na=False` and see if the tests pass once you
apply your changes?
…On Tue, Apr 24, 2018 at 10:09 AM, Dr. Irv ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> Yes, I can split the
changes out for the _can_hold_na part. And I see that the operator stuff
I tried to do is part of a bigger discussion. Should I create a new test
case for the _can_hold_na part?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20801 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIsmL3_cFNRvq4fhiEo-UBQz90xGfks5tr0AWgaJpZM4Tge6g>
.
|
Closing this as it the |
closes #20659
closes #20761
git diff upstream/master -u -- "*.py" | flake8 --diff
NOTE: I expect that the pandas main developers will object to a lot of what is done here! So let's be open on the discussion.
Goals of this pull request:
ExtensionArray
, allow the relational operators to delegate to the base typeExtensionArray
tests to handle the cases of_can_hold_na==False
as best as possible.RelObjectArray
that consists of objects that have operators such as__le__
defined that return objects, anda
andb
areSeries
containing that type of array, thena <= b
returns aSeries
of objects rather than booleans.Things I'm unsure of:
inspect
to determine whether the user has implemented the relational operators for the subclassedExtensionArray
type as well as for the base type of elements in the array. I'm not sure if this is best practice or not.__eq__
) return objects as opposed to booleans, and some (intentionally) throw exceptions. So I tried to catch these various cases. Not having__lt__
defined means that sorting is undefined, and not having__eq__
return a boolean messes up some things related togroupby
. For my application, that is OK.